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The groundwater is the main source of domestic and agricultural 
purposes in the arid and semi-arid regions where the surface water 
availability is limited. To protect and manage the groundwater 
system effectively, a thorough knowledge and understanding of 
groundwater quality and application of computational methods to 
simulate the complex and nonlinear groundwater system are 
paramount necessary. Generally, three types of models such as 
physically based model, conceptual models and Blackbox models 
are applied to study the interconnected processes in the subsurface 
media. In this study, Artificial Neural Network (ANN) (3 Models 
with 1, 2 and 3 outputs) was used to simulate and predict the 
concentration of groundwater quality parameters and Mamdani 
Fuzzy Inference System (MFIS) was used to simulate the water 
quality indices. Classification algorithms of NEUROSHELL and 
MATLAB were used to predict the class of items in a data set. The 
model was constructed using already-labelled items of similar data 
sets. The WQI of 29 samples was determined using weighted 
average method. Based on MEFIS, 10 samples were classified as 
‘good’, four samples as ‘poor’ and remaining samples as ‘very 
poor’. The simulation model using the classification algorithm of 
ANN was used to predict the concentration of groundwater quality 
parameters and it was observed that three ANN models values and 
the actual data fit well with correlation coefficient varying from 
0.93 to 0.99. When the soft computing techniques can be coupled 
with geospatial and geostatical method to map the spatial and 
temporal distribution of water quality parameters. 
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1. Introduction 


The pollution of groundwater has been increasing over the past due to industrial development. 
Over exploitation of the groundwater resources also leads to rapid drop in the groundwater level 
in many areas, which also results in groundwater contamination. The quality of the groundwater 
changes due to discharge of sewage, agricultural and irrigational activities, physical and 
chemical parameters influenced by geological formations and type of aquifers through which it 
passes. To understand the knowledge of the subsurface system, the numerical flow models are 
generally used which demands a huge amount of temporal and spatial information to depict the 
subsurface system. So advanced statistical methods involving data mining techniques can be also 
used to explain the underlying structure of the data obtained to explain hydro chemical processes 
occurring in the aquifers. Artificial Neural Network (ANN) has been applied to simulate 
groundwater quality and Geographic Information System (GIS) has been applied to pre- 
processing and post-processing tool in simulating groundwater quality in Iran, India, Turkey and 
Ghareh-subasin. In addition to the water quality parameters, land use and land cover pattern, 
geological factors and groundwater level have been taken for simulation [1-8]. It has been 
observed that the integration of ANN and GIS has proved the more accurate and efficiency in 
prediction and simulation of groundwater quality. It has been also proved that ANN results are 
better reliable than linear regression models [9-11]. Also, optimization -simulation models have 
been developed by applying ANN, and Particle Swarm Optimization (PSO) models along with 
wireless network in managing groundwater resources [12—14]. Several authors have studied the 
influence of hidden neurons in the ANN in prediction and simulation of water quality 
parameters. It was observed that the number of neurons in the hidden layers has to be identified 
by trial and error based on the location. The groundwater pollution source and groundwater level 
prediction have been predicted using feed forward and back propagation algorithms [15-18]. 
From the results, it was noted that tangent algorithm with momentum-training algorithm gives 
less error than the sigmoid algorithms with Levenberg-Marquet [19,20]. Several researchers have 
applied fuzzy membership functions (Mamdani Fuzzy Inference System (MFIS)) and the 
weights for each groundwater quality parameters according to analytic hierarchy process (AHP) 
(which depends on pairwise comparison) in classifying the groundwater quality from different 
well locations [21-24]. Water quality index (WQI) is valuable and unique rating to depict the 
overall water quality status in a single term that is helpful for the selection of appropriate 
treatment technique to meet the concerned issues. However, WQI depicts the composite 
influence of different water quality parameters and communicates water quality information to 
the public and legislative decision makers. The weighted groundwater quality index based on the 
spatial and temporal variations of groundwater quality was developed using Fuzzy-AHP. In few 
papers, the use of geostatistical approach combined with Fuzzy logic approach has been reported 
to develop zoning map by identifying the spatial distribution of groundwater quality [25-27]. 
Adaptive Neural-Based Fuzzy Inference System (ANFIS) adopted for estimation and prediction 
of pollutant level in groundwater systems [28]. Deep learning algorithms and soft computing 
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applications are applied to solve problems in engineering applications, geotechnical engineering, 
groundwater, sediment transport and meteorological characteristics [29-35]. 


2. Study Area 


Coimbatore district is one of the largest districts of Tamil Nadu which has an aerial extent of 
7470 km”, accounting for 5.74% of the total geographical area of Tamil Nadu (Fig. 1). It consists 
of 19 blocks and is a part of subbasins of Cauvery such as Bhavani, Noyyal, Amaravathy, 
Parambikulam, Aliyar and Valparai. About 87% of the total irrigated areas is through dug wells. 
The annual rainfall over the district varies from 550mm to 900mm. Shallow aquifers exist within 
30m in most of the parts of the district expect in the west. Structural hills, Deep Pediments, 
Valley fill are most of the prominent geomorphic units identified in that area. Six major soil 
types such as Red Calcareous soil, Black Soil, Red non-calcareous, Alluvial and Coalluvial Soil 
Brown Soil and Forest Soil, cover the district. The alluvium and colluvium formations in the 
district are composed of silt, kantar, sand and gravel bed. There exists high level of water level 
fluctuations due to over exploitation for domestic and agricultural activities. It has been reported 
by Central Groundwater Department that out of 19 blocks, 15 blocks are either ‘over exploited’ 
or ‘critical’. So, regarding quality, total hardness, nitrate and fluoride are found to in excess of 
permissible limits due to industrial pollution, geological formations and agricultural activities. It 
is also reported that the groundwater quality in many areas of the district do not conform to the 
standards of drinking water quality. So proper planning is required. 
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Fig. 1. Study area Coimbatore in Tamil Nadu, India [36]. 
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3. Methodology 


In this study, Artificial Neural Network (Three Models with 1, 2 and 3 outputs) was used to simulate and 
predict the concentration of groundwater quality parameters and Mamdani Fuzzy Inference System was 
used to simulate the water quality indices. FIS is the key unit of a fuzzy logic system having decision 
making as its primary work. It uses the “IF...THEN” rules along with connectors “OR” or “AND” for 
drawing essential decision rules. MFIS system was proposed in 1975 by Ebhasim Mamdani. Basically, it 
was anticipated to control a steam engine and boiler combination by synthesizing a set of fuzzy rules 
obtained from people working on the system. A classification algorithms using NEUROSHELL and 
MATLAB were used to predict the class of items in a data set using a certain model of a classifier. The 
model was constructed using already-labelled items of similar data sets. This step allows classification 
techniques to be considered as a supervised machine learning method. 


ANN is a computational tool that is designed to simulate the way in which the brain performs a particular 
task or function of interest. ANNs are made up of highly interconnected processing elements called 
artificial neurons with weights that constitute a network. The artificial neurons are information-processing 
units that are used to build our neural networks and are truly primitive in comparison to those found in the 
brain. Each neuron receives several inputs from neighbouring elements, but only sends one output. The 
four basic elements of the neuronal model are Synapses or connecting links, an adder, an activation 
function and bias. Synapses or connecting links are characterised by weights. A signal x; at input of 
synapse ‘j’ connected to a neuron is multiplied by the synaptic weight w;. An adder is used to sum up all 
the input signals to the neuron, weighted by the respective synapses of the neuron. An activation function 
is applied for limiting the amplitude of the output of a neuron. Typically, the normalised amplitude of the 
output of a neuron is written as the closed unit interval [0,1] or alternatively [-1,1]. The neural network 
model includes an external bias, denoted by b, which has the effect of increasing or lowering the net input 
of the activation function. The network consists of three layers namely input layer, output layer and 
hidden layer in which the information from the outside world is received by the input layer, the simulation 
results are communicated to the outside world through output layer and the two layers are connected by 
the hidden layer. 


3.1. Feedforward network 


Generally, the ANN model consists of three layers and the data is being fed forward from the 
input layer and is being processed in the hidden layer(s) using activation function. The output 
from the second layer is sent as input to the third layer and the data is being processed in the 
forward manner/acyclic type. 


3.2. Learning the pattern of the data for classification 


The neural network learns the pattern through training algorithm by weight adjustment in each 
layer. The network learns by training the network in the forward direction from the input layer 
through summation and processing and the error obtained by comparing the model and target 
values is back propagated from the output layer. The different steps involved in back propagation 
algorithm are: 1. Initializing the connection weights of the neural network, 2. Using three 
activation functions like Threshold function, Piecewise linear function, and Sigmoid function 
(logistic function) to determine the output from each layer and the final target value from the 
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output layer, 3. Computation of model output and comparing with defined target value to 
determine the error, and 4. Back propagating the error and calculation of new weight in each 
layer. Finally, the weights and biases in each layer are updated using delta rule. The gradient 
descent algorithm is used to identify the global minima in the weight space by seeking a 
direction for weight change that reduces the value of e(n). The above steps are repeated till the 
network is trained and global minima in the case of error is obtained. To avoid overfitting and 
noise, the overall groundwater quality data is divided into training and testing data. The training 
phase occurs either in batch or online mode. In addition to that either ‘supervised learning’ or 
“unsupervised learning’ may be used to train the network. 


4. Modelling and simulation 


4.1. Neural networks 


In this study, 29 water quality samples were collected from 29 wells located in the selected area 
and 10 water quality physicochemical parameters such as pH, Carbonate (CO3*), Bicarbonate 
(HCO3”), Chlorides (CI), Sulfate (SO4”), Calcium (Ca’*), Magnesium (Mg”"), Potassium (K*) 
and Total dissolved solids (TDS) were analysed using Standard methods [37]. The nine water 
quality parameters as input and TDS parameter as output were given in Neuroshell ANN 
function. The statistical parameters were determined as given in Table 1. 


Table 1 
Statistical parameters of the data. 
varane | pat] BO | ] Cox |H1C05 | 50, |r mga f*raeenphadne TOs 
Input Input Input Input Input | Input Input Input Input Output 
Min. 6.5 0.9 222.1 0 100 100 200 75 75 603 
Max. 8 6.3 2551.3 7.5 485 700 2055 875 1450 4623 
Mean 7 4.2 923.6 3.75 350.6 | 296.8 | 704.0 325 379.1 1915 
SD 0.49 1.5 674.3 1.31 106.7 | 161.1 | 485.6 223.4 304.7 1132 


From the data, 60% of data was extracted as training set and 30% as testing the data to find 
optimum for interrupt model training. After extracting, a standard net architecture was created 
whether each layer is connected to the previous layer only. The back propagation training 
architecture using Wardnet was used as training algorithm, which has multiple hidden neurons 
with different activation function in one layer similar to neuron of human. Before output is given, 
this architecture receives the output from each neuron and analyses the data. Thus, all input 
parameters given in this are related to each other, which are similar to water quality parameters 
of the study area. Next, selecting the optimum point, which is the minimum of average error of 
test data, sets the end point of the training. The average error of training set always decreases 
even if the number of iterations cross the optimum point but the average error of test-set 
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increases. If the number of iterations is more than the number of iterations of the optimum point, 
the model is suitable for training data. The network is trained till minimum average error is 
reached and statistical parameters such as R-square, r-square, mean square error, mean absolute 
error, correlation coefficient etc. are determined. 


4.2. Fuzzy logic 


In this study, fuzzy inference system is applied to classify the water quality index. Fuzzy set is a 
suitable set for making the decision in complex and unclear system. The membership functions 
were created for ten water quality parameters based on Bureau of India Standards (BIS) and the 
criteria of World Health Organization (WHO). Water Quality Index (WQI) may be defined as a 
rating reflecting the composite influence of different water quality parameters on the overall 
quality of water. The main objective of computing of water quality index is to turn the complex 
water quality data into information which is easily understandable and usable. Weighted 
arithmetic water quality index method classifies the water quality according to the degree of 
purity by using the most commonly measured water quality variables. In this study, WQI has 
been classified into five type “Excellent, good, poor, very-poor and unsuitable for drinking” as 
shown in Table 2. To determine WQI, creation of fuzzy inputs, membership functions and rules 
are the important steps in FIS. The values of five major water quality parameters are fuzzified 
based on the normalization. The membership functions are created to identify the degree of 
membership in each classification. Based on values of BIS and WHO standards, best value is 
chosen to be an excellent category and the worst value to be the last value of poor category. The 
fuzzy distribution is used to generate membership-functions for various water quality parameters. 


Table 2 
The criterion for water quality index. 
WQI value Water quality 
0-25 excellent 
25-50 good 
50-75 poor 
75-100 Very poor 
>100 Unsuitable for drinking 


4.3. Fuzzy distribution 


If rz = Reference value A (angle point A, which is the average of minimum and mean value) and 
tp = Reference value B (angle point B, which is the average of mean and maximum value, the 
widths can be defined as W1 = [min- r, ], W2 = [ra —t»], W3 = [ra- max], where W1 is the width 
of left triangle in the trapezoid, W2 is the width of the square, W3 is the width of the right 
triangle in the trapezoid. The height of the distribution is normalized to (0,1). When the fuzzy 
distribution is used to generate membership-function, it does not have intersection-area as shown 
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in Fig. 2. Fuzzy set has intersection area for each membership function. Table 3 shows the 
membership function for pH. Similarly, the membership functions are created for other water 
quality parameters and water quality index as given in Fig. 3 to Fig. 7. 


Table 3 
Membership functions for water quality parameter pH. 
Parameter Classification Min1 | Maxl | Meanl |; Min2 | Max2 | Mean2 
Excellent 7.00 8.5 Toto 
Good 6.88 7.00 6.94 8.5 8.65 8.58 
pH Poor 6.75 6.88 6.81 8.65 8.8 8.73 
very poor 6.5 6.75 6.63 8.80 9.2 8.99 
unsuitable for drinking 0.00 6.5 3,25 9.20 14 11.6 
Referencejvalues 
Min Parameter 


Fig. 2. A trapezoid membership-function. 
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Fig. 3. pH membership-function. 
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Fig. 4. TDS membership-function. 
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Fig. 5. TH membership-function. 
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Excellent Good Poor very poor Unsuitable 


oO 100 200 300 400 S00 600 700 &00 
Input variable SO, 
Fig. 7. SO, membership-function. 


Once the membership functions are created, the rules are formulated based on min-max method. 


Since the objective of the work is to classify the irrigation water quality based on WQI value, it 


is necessary to generate the rules based on five classifications such as unsuitable for drinking, 


very poor, poor, good and excellent. To formulate rules in efficient manner, it is paramount 


necessary to have field knowledge on the impact of water quality parameters on irrigation. 


Different number of rules was generated for five classifications as given below: 


Unsuitable: If the values of two water quality parameters are above the standards, then the 
water quality is unsuitable for irrigation. So, treatment is to be given before applying for 
irrigation. For this classification, 7 rules have been written. 


Very Poor: Condition 1: If the values of two quality parameters are very poor, WQI is poor. 
For these 7 rules are generated; Condition 2: If the values of one parameter is unsuitable for 


irrigation and other parameter is suitable, WQI is very poor 


Poor: If the values of two water quality parameters are poor, the WQI is poor. For these 7 


rules are generated. 


Good: Condition 1: If the values of two water quality parameters are good, WQI is good; 
Condition 2: If the values of one of the water quality parameters are excellent and other 


parameter is not excellent, WQI is good. For this, 7 rules are generated. 


Excellent: If both water quality parameters are excellent, WQI is excellent. 7 rules have been 


generated for this. 


Totally 35 rules were generated for 5 classifications of WQI. Finally, defuzzification is done 


using centroid method to get Water Quality Index value. 
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5. Result and discussion 


5.1. Neural networks 


The neural network software Neuroshell 2.0 was used to build and run ANN models with one 
hidden layer of 10 neurons, sigmoid function with normalized values for one output, two outputs, 
and three outputs respectively with different momentum factors (0.3, 0.99 and 0.99) and learning 
rates (0.1, 0.08 and 0.1). When the momentum factor and learning rates were increased, there 
was no further change in the error. Using the back propagation algorithms, new weights were 
calculated by updating the weight through error distribution. When the momentum factor was 
increased to 1, larger values were assigned to new weights. This led to increase in the minimum 
average error. For one output (TDS—Model A), the best results of training were obtained at 
momentum factor of 0.3 while the learning rate was 0.1. When the values of momentum factor 
and learning rate were increased, the computational time required to train the network was 
reduced. For two outputs (TDS and TH —Model B) and three outputs (pH, TDS and TH —Model 
C), the minimum average error of 0.0002 was obtained at momentum factors of 0.99, while the 
optimum learning rates were 0.08 and 0.1 respectively (shown in Fig. 8 to Fig. 10). The 
performance values of three different ANN models (Table 4) shows that the model B with two 


outputs (TDS and TH) showed the best prediction results (Fig. 11) compared to other two models 
Aand C, 


Table 4 
Performance of three different ANN Models. 
oe . 1 output 2 outputs 3 outputs 
Statistical Analysis TDS TH TDS pH TH TDS 
r squared 0.94 0.98 0.96 0.87 0.96 0.92 
Mean absolute error 220.5 44.9 115.5 0.12 75.85 196.8 
Correlation coefficient r 0.97 0.99 0.98 0.93 0.98 0.96 
TDS output —®— Moment 
um 
_ _ 
o 
S 
a. 
ss 
— 
i) 
_ 
— 
= T T 7 
= 0 0.2 0.4 0.6 0.8 1 1.2 


weight of factor 


Fig. 8. Comparing of the weight of momentum and learning rate with number of Epoch on 1 output ANN 
model (Model A). 
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ANN model (B). 
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Fig. 10. Comparing of the weight of momentum and learning rate with number of Epoch on 3 outputs 
ANN model (C). 
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Fig. 11. The comparison between actual values and ANN model (B) results 
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5.2. Fuzzy logic 


Fuzzy inference system (FIS) can be used for decision making when there exists uncertainty in 
the system. There are two categories based on which the output is obtained. Both are based on 
the selection of rules. In the first category, if only one rule is active, the FIS will use a mean of 
membership-function range. In the second category the ranges of all the membership functions 
(classifications of five water quality parameters) are taken into account for calculating the final 
output. The maximum value of the activated rule of each category is taken which is multiplied 
with the weight of the membership function to get the final output. Fig. 12 shows the weight and 
centroid of membership-function. 


Xy Y1 


Fig. 12. The weight and centroid of membership-function. 


The weight of membership-function starts from 0 to 1. Then fuzzy will calculate the answer by 
using the formula (1). 


wlxIl+w2y2 
i= as (1) 


wlt+w2 


where x1 and yl are centroid position of membership-function on x axis. In this study, there 
should not be any water quality parameter (out of 5) in the range of unsuitable category of 
membership function for the WQI value to be in good category. For the WQI value to be in the 
category of poor, maximum of two water quality parameters out of five can be in the range of 
unsuitable category. Similarly, for very poor classification of WQI, there can be maximum of 
three water quality parameters under the range of unsuitable category. When minimum number 
of two water quality parameters is in the range of unsuitable category, the water is unsuitable for 
irrigation. Comparison between water quality index classified using fuzzy logic and actual WQI 
is shown in Table 5 and Fig. 13. This shows fuzzy logic classification produced very good WQI 
classification and it helps in taking decision on the water quality whether it can be used as it is or 
to be treated well before it is being used for any purpose. 
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Table 5 
Results of fuzzy simulation. 
TH . : wali 
SNo (oa t) aga t) mg/l a nae t) pleas Pinas Classification 
(Input) Output) 
1 1 603 abe) 222 110 26 pal Good 
2 1 670 260 237 100 27 22 Good 
3 8.0 938 300 395 140 31 38 Good 
4 Te 1072 240 390 130 31 38 Good 
5 7.7 1072 490 405 145 37 35 Good 
6 Lo 938 325 400 160 32 34 Good 
7 8.0 1005 200 400 130 ao 31 Good 
8 8.0 1072 325 390 110 a2 40 Good 
9 7.0 1139 315 321 295 49 43 Good 
10 7.6 2211 900 1036 150 107 120 unsuitable 
11 6.8 2814 770 1352 215 129 121 unsuitable 
12 6.7 3685 1370 2551 380 122 127 unsuitable 
13 6.7 4623 1950 2398 440 124 121 unsuitable 
14 6.6 3417 1225 1767 390 122 121 unsuitable 
15 6.9 2412 785 1209 210 94 95 very poor 
16 6.8 1541 865 1476 270 105 106 unsuitable 
17 6.6 4154 1385 1392 330 124 130 unsuitable 
18 6.5 4020 2055 2028 590 136 133 unsuitable 
19 6.7 2881 195 1925 420 123 126 unsuitable 
20 6.9 1072 440 553 160 51 58 poor 
21 6.6 2345 620 1189 410 115 118 unsuitable 
22 6.9 1139 295 498 200 48 47 Good 
23 6.8 1809 535 829 300 79 76 very poor 
24 7.0 1541 600 543 500 86 80 very poor 
25 pe) 123 650 494 420 fis) ie, poor 
26 7 | 1608 768 691 423 94 99 very poor 
27 6.9 1675 825 632 700 118 115 unsuitable 
28 aA 1206 500 474 270 Be) 54 poor 
29 12 1608 315 592 510 65 57 poor 
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Fig. 13. Comparison of actual WQI to that classified using fuzzy logic inference system. 
6. Conclusions 


Groundwater quality is an important water resource for irrigation, drinking and agriculture needs. 
It is important to study and develop new methods and strategies to understand the vulnerability 
of groundwater to agricultural chemicals and other human activities for its better management. 
This study is about the simulation and prediction of groundwater quality in the study area with 
complex pollution sources such as agriculture, domestic and industrial effluent. Based on the 
variation in the water quality parameters and significance of each parameter (based on the field 
condition) the membership functions and rules were formulated. The water quality of 29 wells 
located in the area was classified and predicted using soft computing techniques such as ANN 
and Fuzzy Logic system using NEUROSHELL and MATLAB tools. The input parameters and 
membership functions for the Fuzzy Inference System were selected based on the field 
experience. The calculated WQI of the samples were compared with the simulated values using 
Fuzzy Inference system. It was observed that, the samples from 20 wells were classified under 
the category, ‘good’, while the four samples were classifying as ‘poor’ and the remaining 
samples were not suitable for irrigation. The uniqueness of fuzzy logic technique is centroid of 
membership-function that can activate several rules in the same time. Further, the simulation and 
prediction was done using ANN and the results depicted that there is a high correlation between 
actual and model values with correlation coefficient varying from 0.93 to 0.99. It was concluded 
that, both simulation and prediction models (Fuzzy Logic and ANN with 2 outputs) which 
showed high accuracy may be used for classifying the wells located in the area polluted by 
different complex sources. 
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